Skip to content

Upload gyp packages to S3 after building#17

Merged
harshita-gupta merged 1 commit intomainfrom
harshitagupta/upload-gyp-packages-to-s3
Apr 16, 2026
Merged

Upload gyp packages to S3 after building#17
harshita-gupta merged 1 commit intomainfrom
harshitagupta/upload-gyp-packages-to-s3

Conversation

@harshita-gupta
Copy link
Copy Markdown
Member

Summary

After building native module packages (bcrypt, cld, unix-dgram, @datadog/pprof),
upload them to s3://asana-oss-cache/node-gyp/v1/ in addition to the GitHub Release.

This enables codez to fetch these packages via Bazel http_file instead of
committing ~112 MB of tarballs to git, saving ~305 MB total per checkout
(node18/node20 tarballs are dead code and will be deleted).

Changes

build-node-packages.yml:

  • Added id-token: write permissions for AWS OIDC auth
  • Added bazel_arch matrix field to map x64amd64 for S3 naming
  • Added configure-aws-credentials step using the push_node_gyp_packages IAM role
  • Added S3 upload step that uploads packages and prints sha256 + tools_repositories.bzl snippet

stage_for_s3.bash:

  • Separated packages_*.tar.gz files before the fibers processing loop
  • Previously these were incorrectly mixed into the fibers archive by the find . -name "*.gz" loop
  • Now prints their sha256 hashes for reference and removes them from the staging dir

Prerequisites

  • IAM role push_node_gyp_packages must be provisioned first: Asana/codez PR #388637
  • After that PR merges, run z permissions iam push in codez to create the AWS role

Test Plan

  • After IAM role is provisioned, trigger build-node-packages.yml manually from the Actions tab
  • Verify tarballs appear at https://asana-oss-cache.s3.us-east-1.amazonaws.com/node-gyp/v1/packages_{arch}_node22.tar.gz
  • Verify sha256 hashes match the currently checked-in tarballs in codez

Co-authored with Claude

Copy link
Copy Markdown

@JackStrohm-asana JackStrohm-asana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach breaks our rollback story. Currently the gyp binaries live in the codez repo directly, so rolling back a deploy automatically rolls back to the correct binary version. Fibers and the node tarballs preserve this property via unique S3 keys — when codez is rolled back, the URL in that revision still points to the exact binary that was tested with it.

With a fixed S3 key (packages_amd64_node22.tar.gz), every new build overwrites the previous one. If we roll back codez to a revision that references an older sha256, Bazel will fetch the current (newer) binary from that URL, get a sha256 mismatch, and fail. We'd have a broken build exactly when we need rollback to work.

The fix is the same pattern used for the other binaries: incorporate a short hash of the tarball content into the S3 key. That way each build produces a distinct, permanent S3 object, and the URL+sha256 pair in codez always refers to a specific binary that won't disappear or change.

After building native module packages (bcrypt, cld, unix-dgram, @datadog/pprof),
upload them to s3://asana-oss-cache/node-gyp/v1/ in addition to the GitHub Release.

This enables codez to fetch these packages via Bazel http_file instead of
committing ~112 MB of tarballs to git, saving ~305 MB total per checkout
(node18/node20 tarballs are dead code and will be deleted).

Changes:
- build-node-packages.yml: Add AWS OIDC auth + S3 upload step after release upload
- stage_for_s3.bash: Separate packages_*.tar.gz before fibers loop to prevent
  them from being incorrectly mixed into the fibers archive

Requires IAM role `push_node_gyp_packages` to be provisioned first
(Asana/codez PR #388637).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@harshita-gupta harshita-gupta force-pushed the harshitagupta/upload-gyp-packages-to-s3 branch from b934c45 to 2b08492 Compare April 15, 2026 19:29
Copy link
Copy Markdown
Member Author

@harshita-gupta harshita-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — updated. Each tarball now gets a content-hashed S3 key using the first 8 chars of its sha256 (e.g., packages_amd64_node22-bb5ac136.tar.gz). This matches the pattern used for Node binaries and fibers: each build produces an immutable artifact, and the URL+sha256 pair in codez always refers to that specific binary.

Co-authored with Claude

Copy link
Copy Markdown

@JackStrohm-asana JackStrohm-asana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else looks good. Maybe at some point the other files that the script produces can automatically be uploaded to S3 as well.

@harshita-gupta harshita-gupta merged commit 104256b into main Apr 16, 2026
14 of 18 checks passed
harshita-gupta added a commit that referenced this pull request Apr 21, 2026
…ck, remove --acl

Three follow-up corrections to PR #17:

1. Remove `--acl public-read` from `aws s3 cp`.
   The bucket has `disable_confusing_acls = true` (BucketOwnerEnforced), which
   disables ACLs entirely. `BlockPublicAcls` + `IgnorePublicAcls` provide
   additional coverage. The ACL flag is silently ignored. The IAM role
   (`S3_ACCESS_MODE.PUT`) also doesn't grant `PutObjectAcl`. Reads go via
   CloudFront OAC, not public-S3.

2. Replace `softprops/action-gh-release` with GitHub's first-party `gh` CLI.
   `gh release upload` is pre-installed on GitHub-hosted runners, removes a
   third-party (single-maintainer) supply-chain dependency, and behaves
   equivalently with `--clobber`.

3. Add a post-upload CloudFront reachability check (`curl -fI`).
   If the CloudFront path_patterns allowlist doesn't include the key's prefix,
   Mac Bazel builds will silently 403. Failing the workflow here surfaces the
   issue before consumers hit it.

S3 path stays `node-gyp/*` (this PR no longer changes it — see codez PR #390222
which adds `node-gyp/*` to CloudFront's path_patterns in system_packages.tf).

Action pinning: tag-pinned per codez convention (100% of codez workflows use
tags, not SHAs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
harshita-gupta added a commit that referenced this pull request Apr 21, 2026
…ck, remove --acl (#18)

* workflows: swap softprops for gh CLI, add CloudFront reachability check, remove --acl

Three follow-up corrections to PR #17:

1. Remove `--acl public-read` from `aws s3 cp`.
   The bucket has `disable_confusing_acls = true` (BucketOwnerEnforced), which
   disables ACLs entirely. `BlockPublicAcls` + `IgnorePublicAcls` provide
   additional coverage. The ACL flag is silently ignored. The IAM role
   (`S3_ACCESS_MODE.PUT`) also doesn't grant `PutObjectAcl`. Reads go via
   CloudFront OAC, not public-S3.

2. Replace `softprops/action-gh-release` with GitHub's first-party `gh` CLI.
   `gh release upload` is pre-installed on GitHub-hosted runners, removes a
   third-party (single-maintainer) supply-chain dependency, and behaves
   equivalently with `--clobber`.

3. Add a post-upload CloudFront reachability check (`curl -fI`).
   If the CloudFront path_patterns allowlist doesn't include the key's prefix,
   Mac Bazel builds will silently 403. Failing the workflow here surfaces the
   issue before consumers hit it.

S3 path stays `node-gyp/*` (this PR no longer changes it — see codez PR #390222
which adds `node-gyp/*` to CloudFront's path_patterns in system_packages.tf).

Action pinning: tag-pinned per codez convention (100% of codez workflows use
tags, not SHAs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update .github/workflows/build-node-packages.yml

Co-authored-by: Eli Skeggs <1348991+skeggse@users.noreply.github.com>

* workflows: hoist matrix values to job-level env, drop expression substitution in run: blocks

Eli's review flagged `${{ matrix.arch }}` in a run: block as an injectable
pattern even though the matrix values are hardcoded and not truly exploitable.
Apply the pattern consistently across the whole workflow:

- Hoist PLATFORM, ARCH, BAZEL_ARCH, REPO to job-level env so each step can
  reference them as shell variables ($ARCH etc.) rather than GitHub Actions
  expressions (${{ matrix.arch }}). Job-level env evaluates matrix context
  since the job is instantiated per matrix combination, so this DRYs up the
  per-step env blocks.
- Rewrite every `run:` block to reference the job-level env vars. No more
  `${{ ... }}` expressions inside shell scripts.
- Secret references (GITHUB_TOKEN) remain step-scoped per least-privilege.
- Minor cleanup: collapse three separate `echo ... >> $GITHUB_ENV` lines into
  a single `{ ...; } >> "$GITHUB_ENV"` block.

Addresses Eli's inline comment on line 114 of the pre-hoist file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Harshita Gupta <harshita-gupta@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Eli Skeggs <1348991+skeggse@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants